Using Stochastic Helmholtz Machine for Text Learning

نویسندگان

  • Jeong-Ho Chang
  • Byoung-Tak Zhang
چکیده

We present an approach for text analysis, especially for topic words extraction and document classification, based on a probabilistic generative model. Generative models are useful since they can extract the underlying causal structure of data objects. For this model, a stochastic Helmholtz machine is used and it is fitted using the wake-sleep algorithm, a simple stochastic learning algorithm. Given a document set, the Helmholtz machine tries to capture the correlation of the words used in the set, thus can extract various semantic features for a set of documents. We present some experimental results on topic words extraction for TDT-2 and TREC-8 ad-hoc data sets. And for another real-world document set, 20 Newsgroup collection, a categorization is performed and the performance is compared with that of naïve Bayes classifier, another simple generative model. Additionally, we present a preliminary work to make Helmholtz machines more appropriate for processing text documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

Joint Stochastic Approximation learning of Helmholtz Machines

Though with progress, model learning and performing posterior inference still remains a common challenge for using deep generative models, especially for handling discrete hidden variables. This paper is mainly concerned with algorithms for learning Helmholz machines, which is characterized by pairing the generative model with an auxiliary inference model. A common drawback of previous learning...

متن کامل

Computing with Uncertainty in Probabilistic Neural Networks on Silicon

This paper suggests that probabilistic VLSI architectures may provide insights into naturally-stochastic processes, and describes a particular application to perform robust data fusion, using unsupervised feature extraction and compensation of sensor drift. Two very interesting, stochastic networks are (1) the Helmholtz machine, which uses a local learning rule and whose hidden units choose sta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001